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1.  INTRODUCTION 


There  exist  a  variety  of  3-D  shape  recovery  techniques  in  computer  vision 
based  on  photometric  constraints,  stereoscopy,  pattern  regularity,  surface  struc¬ 
ture  constraints  and  motion  information  [1].  These  methods  should  play  a  key 
role  in  robotics  applications  as  the  complexity  of  robot  tasks  increases.  In  sys¬ 
tems  such  as  visual  navigation  systems  [2-5],  some  assumptions  or  constraints  are 

needed  to  accurately  obtain  3-D  information  since  “3-D  shape  from _ ”  methods 

are  very  sensitive  to  noise.  In  man-made  environments,  the  flatness  of  the  floor 
and  prevalence  of  vertical  edges  can  be  used  to  infer  the  3-D  motion  of  the  robot 
[2],  or  knowledge  of  the  camera  attitude  can  be  used  to  recover  3-D  structure 
using  stereo  vision  [3].  Many  outdoor  robots  have  used  monocular  inverse  per¬ 
spective  techniques  to  recover  the  3-D  road  structure.  They  employed  assump¬ 
tions  about,  for  example,  road  width,  camera  position  and  smoothness  of  the  road 
surface  [4,  5].  Those  methods  are  very  sensitive  to  even  small  deviations.  Little 
attention  has  been  directed  to  the  problem  of  estimating  the  effect  of  various 
errors  (calibration,  discrepancies  between  model  assumptions  and  true  conditions, 
etc.)  on  3-D  reconstruction  [6]. 

In  this  paper,  we  propose  a  new  method  for  reconstructing  the  3-D  structure 
of  road  boundaries  from  consecutive  images.  First,  we  present  a  method  for 
estimating  depth  information;  this  is  a  motion  stereo  method  applied  to  consecu¬ 
tive  images,  given  an  estimate  of  the  interframe  motion.  The  relation  between 
depth,  motion  and  disparity  is  investigated,  since  the  accuracy  of  the  depth 
depends  on  the  disparity  range.  Next,  the  error  of  the  estimated  road  structure 
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due  to  quantization  errors  and  motion  estimation  error  is  examined.  Finally,  a 
representation  for  road  boundaries  is  proposed  that  makes  explicit  the  error  of 
the  road  edge  location  in  3-D  space.  Experimental  results  are  shown  for  an  input 
image  sequence  taken  by  the  Autonomous  Land  Vehicle  (ALV)  simulator  robot 
[5]. 

2.  THE  PRINCIPLE  OF  MOTION  STEREO  USED  IN  THE  METHOD 

Our  motion  stereo  analysis  is  based  on  the  following  assumptions. 

(1)  The  road  boundaries  in  consecutive  images  are  detected  and  the 
correspondence  between  frames  is  established. 

(2)  The  robot  motion  is  known. 

Algorithms  [5,  7]  have  been  developed  for  road  boundary  detection  in  images. 
Our  program  uses  the  output  of  one  such  algorithm.  Also,  given  the  interframe 
motion,  determining  the  correspondence  between  road  boundaries  in  consecutive 
frames  is  straightforward.  Internal  sensors  can  be  employed  for  accurately 
estimating  the  motion  of  the  robot. 

We  outline  the  depth  recovery  algorithm  as  follows.  Given  the  interframe 
motion,  the  epipolar  line  in  the  second  image  corresponding  to  any  point  P  in  the 
first  image  can  be  determined  by  projecting  the  epipolar  plane  determined  by  P 
and  the  two  camera  centers  onto  the  second  image  [8j.  We  can,  however,  deter¬ 
mine  this  epipolar  line  more  directly.  The  known  motion  consists  of  a  translation 
T—(U,V,W)  and  a  rotation  /?  =(a,/?,' 7).  The  displacement  vector  of  a  feature 
point  in  the  first  frame  consists  of  two  vectors  t  (translation)  and  r  (rotation)  in 
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the  image  plane.  While  r  is  uniquely  determined  when  /?  is  known,  t  has  one 
degree  of  freedom — length.  The  direction  of  t  is  constrained  so  as  to  be  toward 
the  focus  of  expansion  (the  intersection  of  T  with  the  image  plane  [9]).  Thus,  we 
can  determine  the  epipolar  line  as  follows: 

(1)  Transform  the  feature  point  Px  in  the  first  frame  by  the  rotation  r.  The 
displaced  location  is  denoted  by  P[  (see  Figure  1). 

(2)  Draw  a  line  from  P[  to  the  FOE. 

(3)  The  line  segment  P(-FOE  is  the  desired  epipolar  line.  The  corresponding 
point  in  the  second  frame  must  lie  on  this  line  segment  (from  P[  to  FOE) 
when  ^<0  or  on  the  half  line  from  P[  to  Q  (see  Figure  1)  when  W>  0. 

This  method  for  constructing  the  epipolar  line  does  not  require  calculation  of  the 
equation  of  the  epipolar  plane  in  3-D  space.  F urthermore,  the  constraint  (3)  is  not 
explicit  in  previous  methods  [8]. 

The  translation  vector  t—(dz,dy)  is  easily  determined  by  finding  the  inter¬ 
section  between  the  epipolar  line  segment  and  the  road  boundary  in  the  second 
frame  (see  Figure  1).  The  3-D  coordinates  of  the  feature  point  can  be  recovered 
from  the  translation  vector  (dz,dy).  For  simplicity,  we  assume  that  the  first 
frame  has  already  been  corrected  for  the  known  rotation  R ;  therefore,  only  the 
translation  (U,V,W)  needs  be  considered.  The  camera  translation  (U,V,W)  can 
be  interpreted  as  a  translation  (-U,-V,-W)  of  the  feature  point  with  respect  to  a 
fixed  camera.  Therefore,  the  coordinates  of  the  feature  point  (z^yj)  in  the  first 
frame  and  (12*2/2)  *in  the  second  frame  are  given  as 
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/x  fY  /(X-U)  ,  /(r-n 

:i=T’yi=T’X2=  z^vv’  and  V2=  zV 


(2.1) 


where  /  denotes  the  focal  length  of  the  camera  and  (X,Y,Z)  represent  the  3-D 
coordinate  of  the  point  in  the  first  camera  coordinate  system.  From  eqns(2.1) 


Z-W  Z(Z-W) 


and 


fV  ,  JYW 
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<4  =2/ 2-2/1=- 


Z-W'  Z(Z-VF) 


(2.2) 


Finally,  we  obtain  the  3-D  coordinates: 


Z=-1JL+W(^+1)  or  Z=-4^+W'(4L+l) 


dz 


(2.3) 


and 


X=- 


xxZ 

T' 


Y= 


yxZ 

f  ’ 


The  recovery  of  road  structure  from  motion  stereo  does  not,  of  course, 
require  prior  knowledge  of  camera  height,  tilt  angle  or  road  width  as  do  most 
monocular  inverse  perspective  methods. 

3.  PROPERTIES  OF  THE  DISPARITY  FOR  A  MOBILE  ROBOT 

We  consider  how  the  disparity  relates  to  the  tilt  angle  and  the  motion.  Fig¬ 
ure  2  shows  the  parameters  of  our  computer  simulation.  The  image  size  is 
512X480.  The  visual  angle  of  the  camera  is  approximately  33  degrees, 
corresponding  to  about  800  pixels  of  equivalent  focal  length.  The  tilt  angle  of  the 
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camera  is  approximately  23  degrees.  The  camera  height  is  approximately  50  mm; 


we  use  this  length  as  unit  length  in  the  simulations.  In  the  following,  we  set 
U=V= 0  to  simplify  the  analysis;  these  would  be  zero  if  the  robot  moved 
straight  on  a  flat  surface. 

(a)  Disparity  vs.  Tilt  Angle 

When  U=V= 0,  the  disparity  ( dx,dy )  is  given  by 

fXW  *\w  JYW  V\w 

x~  Z(Z-W)~  Z-W'  y~  Z(Z-W)  z-w 

Here  we  need  consider  only  the  disparity  dy  because  the  tilt  angle  does  not  affect 
X  and  U.  If  we  assume  a  flat  surface  and  no  tilt  (0=0),  Y—h  (camera  height); 
therefore 
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9  Z(Z-W)’ 

If  the  camera  has  a  tilt  angle  6,  then  the  camera  has  an  apparent  translation 
component  in  the  direction  of  the  y-axis,  although  the  robot  has  no  translation 
component  in  the  vertical  direction  (V=0);  therefore,  eqn(3.l)  becomes 


d„  = 


fhW 


y  (Asin0+Z  cos0)(/isin0+Zcos0-  W  cos  6) 


(3.2) 


(In  the  following,  (U,  V,  W)  refers  to  the  robot  translation.  See  Appendix  I  for  the 
relation  between  (U,V,W)  and  equation  (3.2).)  Figure  3  displays  the  relationship 
between  disparity  and  tilt  angle  for  the  curve  when  W— 0.1  (h  =  1:  unit  length). 
The  horizontal  axis  corresponds  to  the  distance  S  between  the  camera  and  the 
ground,  and  the  vertical  axis  to  disparity  dy .  The  bar  graphs  below  these  curves 
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represent  the  extent  of  the  visual  fields,  which  depend  on  both  8  and  the  vertical 
field  of  view  of  the  camera.  Based  on  these  graphs,  we  can  make  the  following 
observations. 

1)  For  a  fixed  8,  the  tilt  angle  does  not  have  much  effect  on  dy. 

2)  As  6  increase,  the  visual  field  is  compressed  towards  the  camera,  and  near 
the  camera,  where  disparity  dy  is  large,  accurate  depth  information  can  be 
obtained. 

In  the  following,  the  tilt  angle  is  fixed  to  23  degrees:  this  scales  the  vertical 
visual  field  from  1.4  to  8.7  units  in  front  of  the  camera  when  W— 0.1. 

(b)  Disparity  vs.  Motion 

In  an  ordinary  stereo  vision  system  (e.g.  [10]),  where  the  camera’s  lines  of 
sight  are  parallel  and  the  base  line  is  perpendicular  to  the  line  of  sight,  disparity 
dz  is  proportional  to  the  length  of  the  base  line  (U): 

d. (V-W- 0).  (3.3) 

Next,  we  examine  the  relationship  between  disparity  and  motion  W .  From  (3.2), 
the  relationship  between  disparity  and  motion  when  W= w  is 
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where  Z'=Asin0+Zcos0.  If  we  increase  the  motion  tv  by  a  factor  of  n  (W=nu;) 

jnw  fhnw 

v  Z'(Z'-nwcos6) 

If  F  is  the  ratio  of  disparity  for  velocity  ntv  to  disparity  at  velocity  w,  then 
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Figure  4  shows  curves  of  F  in  terms  of  n  with  to  =0.1.  We  can  make  the  follow¬ 


ing  observations. 


1)  From  this  graph  and  (3.4),  F^n  when  w<g.Z'. 


2)  In  an  ordinary  stereo  system,  the  common  visual  field  of  the  two  cameras 


changes  from  70%  to  30%  when  the  base  line  changes  from  U—  0.1  to 


(/= 0.2.  In  motion  stereo,  the  common  visual  field  of  two  frames  changes 


only  from  95%  to  84%  when  the  motion  changes  from  VF=0.1  to  W=0.2. 


4.  ERROR  ANALYSIS  OF  ESTIMATED  DEPTH  IN  3-D  SPACE 


The  estimated  depth  information  has  errors  due  both  to  quantization  errors 


of  edge  locations  and  to  estimation  errors  of  robot  motion.  In  this  section,  we 


consider  the  errors  in  the  depth  information  in  3-D  space. 


4.1.  Depth  Error  due  to  Quantization  Error  in  Edge  Location 


Since  the  disparity  is  determined  from  the  difference  of  the  corresponding 


feature  points  in  two  frames,  inaccuracy  of  the  disparity  results  from  localization 


errors  of  feature  points  in  the  image  plane.  If  a  point  has  a  quantization  error  ey) 


in  the  ith  frame  (t  =1,2),  the  disparity  has  a  total  quantization  error  ey  given  by 
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Figure  5  shows  a  side  view  of  the  depth  error  due  to  quantization  error  of  edge 


location.  An  actual  point  must  exist  in  a  rectangle  PQR>  (a  solid  in  3-D  space) 


rXV /J 
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which  is  obtained  as  a  common  region  of  two  thick  rays  from  the  first  and  the 
second  camera  positions.  Since  PR  is  very  small  compared  to  the  length  QS ,  we 
approximate  this  rectangle  as  a  “stick”  QS  in  3-D  space. 

An  important  property  of  the  depth  errors  due  to  quantization  errors  is  that 
they  are  independent  of  one  another. 


4.1.1.  Relative  Depth  Errors 

We  can  estimate  the  relative  errors  in  .he  depth  information  by  our  method 
as  follows. 

From  (2.3), 

2^JV+Wp_i(fv+  Wvj 

dy  dy  dy 

If  the  disp^-’ty  error  is  uniformly  distributed  in  the  interval  [-ey  +  ey],  then  the 
maximal  overestimated  depth  Z0  and  the  minimal  underestimated  depth  are 
given  by 

Z0=-r—(-fv+Wy2)  and  Z%  =— i—(-/F+  Wy2). 

dy-Cy  dy-rty 


A  Z ,  the  expected  error  in  Z ,  is  thus 


AZ  = 


Z0-Z% 


d?'e? 


(-/V+Wy2). 


Therefore,  the  relative  error  — —  is  given  by 

z 


When  ey  <5C  dy , 


A  Z 


' - :  the  relative  error  of  the  depth  value  is  inversely 


proportional  to  the  disparity  value. 


4.1.2.  Results  of  Simulation 

We  applied  our  method  to  synthetic  road  boundaries.  In  the  following  simu¬ 
lation,  we  used  the  same  camera  parameters  as  in  Section  3  (tilt  angle  is  fixed  at 
23  degrees).  Figure  6(a)  shows  the  road  boundaries  extracted  from  first  frame  (+) 
and  second  frame  £D)  with  translation  (17,  V,  W)=(0,0,0.4)  and  rotation 
(a,/?,'7)=(0,-5.0°,0)  (steering  to  the  right).  The  locations  of  feature  points  on  the 
road  boundary  are  quantized  in  a  512X480  pixel  image.  Figure  6(b)  shows  the 
3-D  road  structure  in  the  world  coordinate  system  from  the  top  and  side.  Large 
circles  in  the  top  view  show  the  locations  of  the  robot  in  the  first  frame  (+)  and 
in  the  second  frame  (small  Q.  Figure  7  shows  the  results  of  reconstruction. 
Short  line  segments  (which  we  will  refer  to  as  “sticks”)  with  small  circles  at  both 
ends  represent  intervals  of  edge  location  in  3-D  when  ey  =  l  pixel.  The  true  loca¬ 
tion  (+)  lies  on  the  corresponding  stick.  The  nearer  the  point  is  to  the  viewer, 
the  shorter  is  the  length  of  the  stick.  The  relative  depth  error  was  very  nearly 
inversely  proportional  to  disparity  as  mentioned  above. 


4.2.  Depth  Error  due  to  Estimation  Error  of  Robot  Motion 

So  far  we  have  reconstructed  the  depth  information  assuming  that  the 
motion  parameters  are  accurate.  In  practice,  however,  the  motion  parameters 
might  not  be  very  accurate.  Therefore,  analysis  of  the  effect  on  the  estimated  3- 
D  structure  caused  by  the  errors  in  the  motion  parameters  is  very  important. 
Here,  we  consider  the  inaccuracy  of  the  translational  motion  and  do  not  deal  with 
that  of  rotation.  We  discuss  the  effect  of  the  inaccuracy  of  rotational  motion  in 
Section  5. 

When  the  translation  components  of  the  robot  motion  have  some  errors, 
their  effect  on  the  depth  value  is  not  local  as  in  Section  4.1  but  global  to  the 
scene  because  the  translational  components  (U,  V,  W)  are  common  to  all  points  in 
the  scene.  Figures  8  show  the  perspective  and  side  views  of  this  process.  Obvi¬ 
ously,  the  error  in  the  vertical  component  AV  of  translation  has  a  large  effect  on 
the  estimated  depth  because  the  angle  between  the  y-axis  of  the  camera  and  the 
line  of  sight  to  the  point  is  large. 

4.2.1.  Experimental  Results 

We  used  the  ALV  simulator  robot  in  the  Computer  Vision  Laboratory  at  the 
University  of  Maryland  to  input  road  images.  The  motion  of  the  ALV  can  be 
simulated  by  the  robot  arm  which  is  program  controllable  from  a  VAX-1 1/785 
(see  [5]  for  more  detail).  The  input  images  are  quantized  into  512X480  pixel 
images.  Figures  9(a)  and  (b)  show  the  input  road  images  where  the  black  squares 
beside  the  road  are  used  to  determine  the  location  of  the  FOE  from  consecutive 
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images.  The  most  serious  problem  is  camera  calibration.  We  obtained  the  inter¬ 
nal  camera  parameters  (the  focal  length  and  the  image  center)  by  moving  the 
camera  a  known  direction  and  distance  (see  Appendix  II  for  more  detail).  The 
road  boundaries  are  easily  obtained  by  [5]  or  [7)  and  are  fitted  to  line  segments. 
Top  views  of  the  road  boundary  space  curves  estimated  from  various  components 
of  translation  are  shown  in  Figures  10(a),  (b)  and  (c)  for  U,  V  and  W  with  vari¬ 
ous  errors.  Each  component  changes  from  -5%  to  +5%  of  the  traveled  distance 
such  that  the  traveled  distance  is  constant  (1  inch).  From  Figure  10(b),  we  can 
see  that  the  V  component  has  a  large  effect  on  the  depth  value  as  mentioned 
above  in  Figure  8. 

Although  our  method  does  not  require  knowledge  of  road  width,  camera 
height  or  tilt  angle  of  the  camera,  these  values  are  very  useful  for  verifying  the 
result  of  the  method.  From  the  estimated  range  of  depth  information,  we  can 
obtain  the  following  parameters: 

Camera  height:  from  1.9  inches  to  2.3  inches  (actually,  about  2  inches) 

Tilt  angle:  from  20°  to  24°  (actually,  about  23°) 

Road  width:  from  2.4  inches  to  3  inches  (actually,  about  2.6  inches).  As  a  result, 
the  error  range  is  very  large  although  the  actual  values  are  included  in  this  range. 

5.  DISCUSSION 

We  have  described  a  new  method  for  reconstructing  the  depth  information 
of  a  road  boundary  from  motion  stereo  and  for  representing  the  error  of  a  3-D 
road  structure  in  3-D  space.  The  depth  errors  due  to  inaccuracy  of  motion  esti- 


mation  are  especially  important  because  the  errors  .in  the  translational  com¬ 
ponents  change  the  depth  values  of  all  points  in  the  same  way.  In  this  paper,  we 
have  not  considered  the  effect  of  the  rotational  movement  on  the  depth  value.  A 
rotational  motion  about  the  X-  or  Y-axis  of  1°  causes  a  motion  of  10  pixels  in  the 
image  plane  if  the  same  focal  length  lens  is  used  (about  800  pixels  of  equivalent 
focal  length).  Displacement  due  to  a  rotation  around  the  Z-axis  depends  on  the 
distance  from  the  image  center  and  the  maximum  displacement  is  4  or  5  pixels  at 
the  edge  of  the  image.  Therefore,  an  error  in  rotation  estimation  can  cause  seri¬ 
ous  errors  in  disparity.  As  mentioned  in  Section  4,  the  error  range  of  depth  infor¬ 
mation  is  relatively  large  if  we  use  only  two  frames.  Therefore,  we  need  to  use 
many  frames  and/or  other  sensor  outputs  in  order  to  refine  the  depth  informa¬ 
tion.  The  final  goal  of  our  research  is  to  obtain  a  more  accurate  3-D  road  struc¬ 
ture  by  fusing  the  error  ranges  of  depth  information  from  many  more  frames 
and/or  from  range  sensors  in  3-D  space. 
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Appendix  Is  DISPARITY  WITH  TILT  ANGLE 


When  the  camera  has  some  tilt  angle,  the  translation  of  the  robot  is  not 
equivalent  to  that  of  camera.  We  denote  the  translation  of  the  robot  by  ( Uw , 
V*) .  )  and  that  of  the  camera  by  ( Ue ,  Vc ,  We ).  The  relations  between  them 

are  given  by 

Ue  —  Uw,  Ve  =  Wws\n9+Vwcos9,  Wc  =  Wwcos9-Vws\n9, 

where  9  is  the  tilt  angle  of  the  camera.  Similarly,  denote  the  camera  height  and 
the  distance  to  the  point  from  the  viewer  of  the  robot  by  ( Zw  ,hw )  and  that  of  the 
camera  by  ( Zc,hc ).  The  relations  between  them  are  given  by 

Zc  —Z cosfl+A  sin0,  hc  —-Z sinfl+A  cos 9. 

Substituting  (Uc,  Vc,  Wc,  Zc,  hc)  instead  of  (U,V,W,Z,h)  in  eqn(2.2),  we  obtain 
eqn(3.2). 


Appendix  II:  CAMERA  CALIBRATION  FOR  MOTION  STEREO  . 
A.  Basic  Assumptions 

Camera  calibration  in  the  context  of  3-D  machine  vision  is  the  process  of 
determining  the  internal  camera  geometric  and  optical  characteristics  (internal 
camera  parameters)  and/or  the  3-D  position  and  orientation  of  the  camera  frame 
relative  to  a  certain  world  coordinate  system  (external  camera  parameters).  (The 
current  state  of  the  art  is  well  described  in  (llj.)  We  consider  obtaining  only  the 
internal  camera  parameters  using  the  motion  of  the  robot  arm  to  which  the  cam¬ 
era  is  attached,  since  the  motion  stereo  method  does  not  need  external  parame¬ 
ters  to  obtain  depth  information. 

The  camera  calibration  method  described  here  is  based  on  the  following 
assumptions. 

(1)  There  is  no  lens  distortion:  Although  it  exists  even  if  we  use  a  very  accu¬ 
rate  lens,  we  do  not  consider  lens  distortion  here  in  order  to  simplify  the  method. 

(2)  The  motion  parameters  of  the  robot  arm  are  known  accurately:  We 
used  an  American  Robot  MERLIN  robot  arm  on  which  a  small  solid  state  CCD 
black  and  white  camera  is  attached.  Theoretically,  this  robot  can  position  its 
arm  to  within  .001  inch  of  a  previously  defined  point  ([12]).  Therefore,  we  can 
expect  that  the  motion  parameters  are  accurate  enough  for  our  purpose. 

(3)  The  geometrical  relation  between  the  camera  and  the  robot  arm  is 
known:  The  camera  position  and  orientation  relative  to  the  robot  arm  have  been 
obtained  ([5])  using  the  geometrical  parameters  of  the  road  simulation  board.  Of 
course,  there  is  a  slight  difference  between  the  true  and  estimated  camera  axes 


(we  call  the  optical  axis  of  the  camera  the  camera  axis);  therefore,  the  image  ori¬ 
gin  of  the  estimated  camera  axis  is  different  from  the  true  one.  However,  there  is 
an  interesting  remark  in  (l  l]  that  the  fitting  between  the  observation  and  model 
is  still  good  although  different  setups  yield  different  values  of  the  image  origin  for 
the  same  camera  and  lens.  Although  we  cannot  prove  theoretically  that  the 
image  origin  has  a  negligible  effect  on  the  accuracy  of  the  final  3-D  measurement, 
we  can  compute  the  expected  depth  error  due  to  this  by  simulation.  Figure  A-l 
shows  the  side  view  of  the  true  and  estimated  camera  axes.  A  point  P  in  3-D 
space  is  projected  on  the  true  image  plane  at  yj>,  but  we  observe  this  point  on 
the  estimated  image  plane  as  yj>.  The  relation  between  these  y -coordinates  is 
given  as  follows  when  the  angle  between  these  camera  axes  is  6: 

y'p! 

yp— - , 

fcasO-ypsinQ 

where  /  denotes  the  focal  length.  Similarly,  the  relation  between  the  true  dispar¬ 
ity  dy 1  and  the  estimated  disparity  dye  of  two  points  P  and  Q  in  3-D  space  is 
given  by 

dye —dy* - - -  - . 

(f  cosO-yf>s\n0)(f  cosO-yfosinG) 

Figures  A-2(a)  and  (b)  show  the  relative  error  of  the  depth  value  estimated  from 
the  above  equation  when  the  translation  W  in  the  2-axis  direction  is  0.25  and  0.5 
(the  dimension  is  the  same  as  in  Section  4),  respectively.  The  horizontal  axis  is 
for  the  distance  from  the  camera  to  the  point,  the  vertical  axis  for  the  relative 
error  of  depth.  The  hatched  region  shows  the  error  due  to  the  quantization  error 


f2cosO 


-  15  - 


UYmmrwrr 


WVJ  W\-  a 


i 

| 


> 


I 

I 

estimated  in  Section  4.1.  From  these  figures,  we  can  regard  the  error  due  to  the  j 

orientation  error  of  the  camera  axis  as  negligible  as  long  as  the  angle  between  the 
true  and  estimated  camera  axes  is  very  small,  for  example,  less  than  2°  (1° 
corresponds  to  about  a  10  pixel  shift  of  the  image  origin). 

B.  Determining  the  Internal  Camera  Parameters 

From  the  viewpoint  of  optical  flow  analysis,  we  can  determine  the  internal 
camera  parameters  by  moving  the  camera  a  known  distance  and  known  direction. 

When  the  camera  moves  in  3-D  space  with  a  translation  (U,  V,  W),  the  coordi¬ 
nates  of  the  Focus  of  Expansion  {xfoEiVfoe)  are  g"iven  as  follows  ([3]): 

fU  IV  ,.  . 

xFOE=±w’  VFOE^-^y  (A-1) 

Thus,  we  can  determine  the  image  origin  and  the  focal  length  by  moving  the 
camera  straight  (U=V =0)  and  diagonally  [U=V=W). 

(a)  Image  origin 

First,  we  determine  the  coordinates  of  the  image  origin  (cr,cy)  by  moving 
the  camera  with  a  translation  (.0,  .0,  4.0)  (in  the  following,  the  dimension  of  the 
translation  is  an  “inch”).  Figure  A-3  shows  the  position  of  the  image  origin 
which  is  obtained  as  the  FOE  of  the  translation.  The  specification  of  the  feature 
points  and  their  correspondences  between  two  frames  are  performed  by  hand. 

The  image  size  is  512X480  and  the  obtained  coordinates  of  the  image  origin  are 
(«*»«»  M 273.59,  233.96). 
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(b)  Focal  length 

Theoretically,  determining  the  focal  length  is  straightforward,  but  there  are 
two  unknowns:  aspect  ratio  r  of  the  quantized  pixel  in  the  vertical  direction  to 
the  pixel  in  the  horizontal  direction,  and  roll  angle  7  between  the  true  and 
estimated  camera  coordinate  systems.  Figure  A-4(a)  and  (b)  are  flow  images  with 
translations  (4.0,  .0,  .0)  and  (.0,  4.0,  .0),  respectively,  and  they  show  the  existence 
of  the  roll  angle  between  the  true  and  estimated  camera  coordinate  systems. 
Then,  in  order  to  obtain  the  focal  length  }x  in  the  horizontal  direction  and  fy  in 
the  vertical  direction  ( fx  =rf  ),  we  determine  the  aspect  ratio  r  and  the  roll  angle 


7  as  follows: 


(1)  Make  four  flow  images  with  translations  (±t,±t,t)  (<7^ 0)  and  determine 
the  four  coordinates  of  the  FOE  for  each  translation.  Their  positions  and 
coordinates  are  denoted  as  P,  Q,  R,  S,  and  (a?,-,y,-)  ( i=P,Q,R,S )  in  Fig¬ 
ure  A-5. 


(2)  Determine  r  so  that  angl e(<POQ)=~—  (or  <QOR,  <ROS ,  <SOP 

& 

=-)• 

2  ; 

From  the  scalar  product  of  two  vectors  OP  and  OQ: 

( xP-cx)x{xQ-cx  )+r(cy-yP)Xr{cy-yQ  )=0. 


(3)  Obtain  the  roll  angle  7  as  a  difference  angle  between  —  and 
angle(  <XOP).  The  angle(<XOF)  is  given  by 


angle(<X0P)=tan  1 


<  cy-yp 
1  xp~cx 

(4)  Determine  the  focal  lengths  fx  and  fy  by  rotating  P  (or  Q,R,S )  by  the  roll 
angle  7. 

Figures  A-6(a),  (b),  (c)  and  (d)  show  the  flow  images  with  translations  (±2.5, 
±2.5,  2.5)  and  their  FOEs.  Compared  with  the  FOE  of  the  straight  motion  (Fig¬ 
ure  A-4),  the  variance  of  the  interoections  with  flow  is  large.  As  a  result,  we 
obtain  the  parameters  as  follows: 

r  =0.787,  7=-1.26°,  fx  =614.5,  fy  =780.7. 

C.  Concluding  remarks 

We  have  described  a  camera  calibration  method  for  motion  stereo.  In  this 
method,  we  obtained  the  internal  camera  parameters:  the  image  origin  and  the 
focal  length.  Since  we  used  the  relative  values  of  the  robot  motion,  that  means 
we  did  not  measure  absolute  values,  and  since  the  estimated  depth  includes  vari¬ 
ous  errors  as  described  in  Section  4,  to  estimate  the  accuracy  of  the  calibrated 
parameters  seems  difficult.  From  our  experiment,  we  can  conclude  that: 

(l)  To  obtain  a  more  accurate  focal  length,  we  need  to  input  many  more 
images  with  various  translations  and  determine  the  FOE  with  a  smaller 
variance.  As  mentioned  above,  the  variance  of  the  FOE  with  diagonal 
translation  is  large,  say  10  pixels,  because  the  angles  between  flows  are 


small. 


u  t.n.n.i  t'lL'iUi 


We  should  consider  lens  distortion.  The  difference  between  the  coordi¬ 
nates  of  the  FOE  obtained  from  the  robot  motion  parameters  and  that 
from  feature  point  matching  in  two  frames  is  about  4%  of  the  traveled 
distance.  Since  the  motion  parameters  of  the  robot  arm  are  very  accurate, 
this  error  seems  to  occur  due  to  quantization  error  and  lens  distortion.  If 
we  make  a  calibration  table  for  lens  distortion,  we  could  obtain  a  better 
estimate  of  depth  value. 
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Figure  1  Epipolar  line  in  motion  stereo 

Figure  2  Geometrical  parameters  of  the  ALV 

Figure  3  Disparity  curves  vs.  tilt  angles 

Figure  4  Factor  curves  in  terms  of  motion  parameters 

Figure  5  Side  view  of  depth  errors  due  to  quantization  error  in  edge  location 
Figure  6  Road  boundaries  with  translation  and  rotation 
Figure  7  Simulation  result  for  Figure  6 

Figure  8  Space  curves  of  road  boundary  estimated  from  erroneous  translation 
Figure  9  Input  images  taken  by  ALV  simulator 
Figure  10  Experimental  results. 

Figure  A-l  The  relation  between  the  true  and  estimated  camera  axes 

Figure  A-2  Relative  errors  of  depth  due  to  the  difference  angles  between  the  true 
and  estimated  camera  axes 

Figure  A-3  The  image  origin  determined  by  straight  motion 

Figure  A-4  Flow  images  with  vertical  and  horizontal  translations 
Figure  A-5  Four  FOEs  by  four  diagonal  translations 
Figure  A-6  Four  FOEs  from  experiments 
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Fig.A-3  The  image  origin  determined  by 
straight  motion 
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FigA>2  Relative  errors  of  depth  due  to  the 
difference  angles  between  the  true  and 
estimated  camera  axes 


Fig.A-4  Flow  images  with  vertical  and  horizontal  translations 
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